37 research outputs found

    Data-efficient reinforcement learning with self-predictive representations

    Full text link
    L'efficacité des données reste un défi majeur dans l'apprentissage par renforcement profond. Bien que les techniques modernes soient capables d'atteindre des performances élevées dans des tâches extrêmement complexes, y compris les jeux de stratégie comme le StarCraft, les échecs, le shogi et le go, ainsi que dans des domaines visuels exigeants comme les jeux Atari, cela nécessite généralement d'énormes quantités de données interactives, limitant ainsi l'application pratique de l'apprentissage par renforcement. Dans ce mémoire, nous proposons la SPR, une méthode inspirée des récentes avancées en apprentissage auto-supervisé de représentations, conçue pour améliorer l'efficacité des données des agents d'apprentissage par renforcement profond. Nous évaluons cette méthode sur l'environement d'apprentissage Atari, et nous montrons qu'elle améliore considérablement les performances des agents avec un surcroît de calcul modéré. Lorsqu'on lui accorde à peu près le même temps d'apprentissage qu'aux testeurs humains, un agent d'apprentissage par renforcement augmenté de SPR atteint des performances surhumaines dans 7 des 26 jeux, une augmentation de 350% par rapport à l'état de l'art précédent, tout en améliorant fortement les performances moyennes et médianes. Nous évaluons également cette méthode sur un ensemble de tâches de contrôle continu, montrant des améliorations substantielles par rapport aux méthodes précédentes. Le chapitre 1 présente les concepts nécessaires à la compréhension du travail présenté, y compris des aperçus de l'apprentissage par renforcement profond et de l'apprentissage auto-supervisé de représentations. Le chapitre 2 contient une description détaillée de nos contributions à l'exploitation de l'apprentissage de représentation auto-supervisé pour améliorer l'efficacité des données dans l'apprentissage par renforcement. Le chapitre 3 présente quelques conclusions tirées de ces travaux, y compris des propositions pour les travaux futurs.Data efficiency remains a key challenge in deep reinforcement learning. Although modern techniques have been shown to be capable of attaining high performance in extremely complex tasks, including strategy games such as StarCraft, Chess, Shogi, and Go as well as in challenging visual domains such as Atari games, doing so generally requires enormous amounts of interactional data, limiting how broadly reinforcement learning can be applied. In this thesis, we propose SPR, a method drawing from recent advances in self-supervised representation learning designed to enhance the data efficiency of deep reinforcement learning agents. We evaluate this method on the Atari Learning Environment, and show that it dramatically improves performance with limited computational overhead. When given roughly the same amount of learning time as human testers, a reinforcement learning agent augmented with SPR achieves super-human performance on 7 out of 26 games, an increase of 350% over the previous state of the art, while also strongly improving mean and median performance. We also evaluate this method on a set of continuous control tasks, showing substantial improvements over previous methods. Chapter 1 introduces concepts necessary to understand the work presented, including overviews of Deep Reinforcement Learning and Self-Supervised Representation learning. Chapter 2 contains a detailed description of our contributions towards leveraging self-supervised representation learning to improve data-efficiency in reinforcement learning. Chapter 3 provides some conclusions drawn from this work, including a number of proposals for future work

    GAIT: A Geometric Approach to Information Theory

    Full text link
    We advocate the use of a notion of entropy that reflects the relative abundances of the symbols in an alphabet, as well as the similarities between them. This concept was originally introduced in theoretical ecology to study the diversity of ecosystems. Based on this notion of entropy, we introduce geometry-aware counterparts for several concepts and theorems in information theory. Notably, our proposed divergence exhibits performance on par with state-of-the-art methods based on the Wasserstein distance, but enjoys a closed-form expression that can be computed efficiently. We demonstrate the versatility of our method via experiments on a broad range of domains: training generative models, computing image barycenters, approximating empirical measures and counting modes.Comment: Replaces the previous version named "GEAR: Geometry-Aware R\'enyi Information

    Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

    Full text link
    We introduce Simplicial Embeddings (SEMs) as a way to constrain the encoded representations of a self-supervised model to LL simplices of VV dimensions each using a Softmax operation. This procedure imposes a structure on the representations that reduce their expressivity for training downstream classifiers, which helps them generalize better. Specifically, we show that the temperature Ď„\tau of the Softmax operation controls for the SEM representation's expressivity, allowing us to derive a tighter downstream classifier generalization bound than that for classifiers using unnormalized representations. We empirically demonstrate that SEMs considerably improve generalization on natural image datasets such as CIFAR-100 and ImageNet. Finally, we also present evidence of the emergence of semantically relevant features in SEMs, a pattern that is absent from baseline self-supervised models.Comment: 22 pages, 5 figures, 5 tables, Preprin

    Bigger, Better, Faster: Human-level Atari with human-level efficiency

    Full text link
    We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.Comment: ICML 2023 Camera Read

    Sonication versus the conventional method for evaluation of the dental microbiome: a prospective pilot study

    Get PDF
    Objectives: To investigate sonication as a new tool in microbiological probing of dental infections. Methods: Comparison of a standard probing method: intraoperative swab, with sonication, and vortex of the removed tooth, was performed on 20 carious destructed teeth. Illumina high throughput sequencing of the 16S-rRNA-gene was used for assessing the microbial composition. Antibiotic susceptibility has been assigned based on known resistances of each detected species. Probing procedures were compared using Bland-Altmann-Test, and antibiotic susceptibility using the Friedmann-Test and alpha-adjusted post-hoc-analysis. Results: In total, 60 samples were analysed: 20 intraoperative swabs, 20 vortex fluids, and 20 sonication fluids. Sonication fluid yielded the highest number of bacterial sequencing reads in all three procedures. Comparing the operational taxonomic units (OTUs) of the identified bacteria, significantly more OTUs were found in sonication fluid samples. Phylum and order abundances varied between the three procedures. Significantly more Actinomycetales have been found in sonication fluid samples compared to swab samples. The assigned resistance rates for the identified bacteria (1.79-31.23%) showed no differences between the tested probing procedures. The lowest resistance rates were found for amoxicillin + clavulanate (3.95%) and levofloxacin (3.40%), with the highest in amoxicillin (30.21%) and clindamycin (21.88%). Conclusions: By using sonication on extracted teeth, it is possible to get a more comprehensive image of the residing microbial flora compared to the standard procedure. If sonication is not available, vortexing is a potential alternative. In immunocompromised patients, especially when actinomycosis is suspected, sonication should be considered for a more detailed microbiological evaluation of the potential disease-causing microbiome. Due to the high rates of antibiotic resistance, a more targeted antibiotic therapy is favourable. Levofloxacin should be considered as a first-line alternative to amoxicillin + clavulanate in patients with an allergy to penicillin

    Abschlussbericht VRmed - Virtual Reality in der medizinischen Lehre: Ein Projekt der Medizinischen Fakultät der Universität Leipzig, Referat Lehre, Bereich Medien

    Get PDF
    The advance of digitization influences medical sciences in various areas, increasingly including medical education. Therefore the Teaching Department of the Medical Faculty of the University of Leipzig constantly considers new technical developments and their possibilities for use in medical teaching. The focus is on the fact that teaching should be supplemented and explicitly not replaced by digital media. Virtual reality (hereinafter referred to as 'VR') represents a technology that can be expected to offer promising potential. In order to determine to what extent VR represents an added value for the study of human medicine and which hardware and software is suitable, the project VRmed – Virtual Reality in Medical Teaching was initiated in the Media section of the Teaching Department of the MF. This was funded as part of the Digital Fellowship Program by the University Didactic Center Saxony and the Working Group E-Learning of the LRK Saxony. The present report represents the final report of the project, which was created on its own initiative. In order to investigate the question of implementation possibilities for medical studies, four VR glasses (three different models) and four VR applications were purchased. Two simulation applications and two anatomy applications were selected as applications. The former are i:medtasim and StepVR applications. In addition, the anatomy applications 3D Organon VR Anatomy and Medicalholodeck were purchased. The initially extensive multi-stage evaluation with lecturers and students could not be implemented in 2020/2021 due to the pandemic-related restrictions and was therefore only applied in limited extent. Thus, hardware and software were evaluated qualitatively and in depth in the context of three presentation events by lecturers and media didactics. In particular, the simulation applications are considered to be helpful and useful extensions for teaching. The anatomy application 3D Organon VR Anatomy could also be used profitably in medical studies, especially in the early semesters. With regard to i:medtasim, there are initial considerations to include this in the curriculum as part of a medical elective. Another perspective is the establishment of a VR lab in which students and lecturers can freely use the technology. It should also be noted that VR is associated with many technical challenges and both the setup and the first use require expertise. In addition, the purchase is cost-intensive and hardware and software develop very quickly. Nevertheless, the potentials and the added value predominate. VR can be used to meet a wide range of learning types, practice scenarios bridge the gap between theory and practice, and students and lecturers can connect to technical developments.:1. Einleitung 2. Theoretische Hinführung 3. VR an Medizinischen Fakultäten und Universitäten außerhalb des Standorts Leipzig 4. Projektbeschreibung VRmed – Virtual Reality in der medizinischen Lehre Leipzig 5. VR Hardware und Software für den medizinischen Einsatz 6. Evaluation 7. Fazit und Ausblick 8. Literaturverzeichnis 9. Online-Quellen Anhang A) Projektstrukturplan B) Zeitplan C) Poster D) EvaluationsprotokolleDas Voranschreiten der Digitalisierung beeinflusst die Medizin in verschiedenen Bereichen, weshalb deren Relevanz auch im Medizinstudium zunimmt. Daher werden im Referat Lehre der Medizinischen Fakultät der Universität Leipzig stetig neue technische Entwicklungen und deren Möglichkeiten für den Einsatz in der medizinischen Lehre betrachtet. Im Fokus steht, dass die Lehre ergänzt und explizit nicht durch digitale Medien ersetzt werden soll. Virtual Reality (im Folgenden „VR“) stellt dabei eine Technologie dar, die in der ersten Auseinandersetzung vielversprechende Potentiale erwarten lässt. Um festzustellen, inwiefern VR einen Mehrwert für das Humanmedizinstudium darstellt und welche Hard- und Software dabei in Frage kommt, wurde im Bereich Medien des Referats Lehre der MF das Projekt VRmed – Virtual Reality in der medizinischen Lehre initiiert. Dies wurde im Rahmen des Digital Fellowship-Programms vom Hochschuldidaktischen Zentrum Sachsen und dem Arbeitskreis E-Learning der LRK Sachsen gefördert. Der hier vorliegende Bericht stellt den Abschlussbericht des Projektes dar, welcher aus Eigenantrieb erstellt wurde. Um der Frage nach Implementierungsmöglichkeiten für das Medizinstudium nachzugehen, wurden vier VR-Brillen (drei verschiedene Modelle) und vier VR-Anwendungen angeschafft. Als Anwendungen wurden zwei Simulationsanwendungen und zwei Anatomieanwendungen ausgewählt. Bei ersterem handelt es sich um die Anwendungen i:medtasim und StepVR. Zudem wurden die Anatomieanwendungen 3D Organon VR Anatomy und Medicalholodeck eingekauft. Die zunächst umfangreich angelegte mehrstufige Evaluation mit Dozierenden und Studierenden konnte aufgrund der pandemiebedingten Einschränkungen in den Jahren 2020/2021 nicht umgesetzt werden und wurde eingegrenzt. Somit wurde Hard- und Software im Rahmen von drei Präsentationsveranstaltungen von Dozierenden und Mediendidaktiker:innen qualitativ und tiefgehend evaluiert. Insbesondere die Simulationsanwendungen werden als hilfreiche und sinnvolle Erweiterungen für die Lehre eingeschätzt. Auch die Anatomieanwendung 3D Organon VR Anatomy könnte im Medizinstudium, insbesondere in die frühen Semester, gewinnbringend eingesetzt werden. Bezüglich i:medtasim existieren erste Überlegungen, dies im Rahmen eines humanmedizinischen Wahlfachs in das Curriculum einzubinden. Eine weitere Perspektive ist die Etablierung eines VR-Labs, in dem Studierende und Dozierende die Technik frei nutzen können. Es bleibt auch festzuhalten, dass VR mit vielen technischen Herausforderungen verbunden ist und sowohl das Einrichten als auch die erste Nutzung Expertise bedürfen. Zudem ist die Anschaffung kostenintensiv und Hard- und Software entwickeln sich sehr schnell. Dennoch überwiegen die Potentiale und der Mehrwert. Durch VR kann vielfältigen Lerntypen begegnet werden, durch Übungsszenarien wird eine Brücke zwischen Theorie und Praxis geschlagen und Studierende wie auch Dozierende können an technische Entwicklungen anschließen.:1. Einleitung 2. Theoretische Hinführung 3. VR an Medizinischen Fakultäten und Universitäten außerhalb des Standorts Leipzig 4. Projektbeschreibung VRmed – Virtual Reality in der medizinischen Lehre Leipzig 5. VR Hardware und Software für den medizinischen Einsatz 6. Evaluation 7. Fazit und Ausblick 8. Literaturverzeichnis 9. Online-Quellen Anhang A) Projektstrukturplan B) Zeitplan C) Poster D) Evaluationsprotokoll

    Large Language Models as Generalizable Policies for Embodied Tasks

    Full text link
    We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment. Using reinforcement learning, we train LLaRP to see and act solely through environmental interactions. We show that LLaRP is robust to complex paraphrasings of task instructions and can generalize to new tasks that require novel optimal behavior. In particular, on 1,000 unseen tasks it achieves 42% success rate, 1.7x the success rate of other common learned baselines or zero-shot applications of LLMs. Finally, to aid the community in studying language conditioned, massively multi-task, embodied AI problems we release a novel benchmark, Language Rearrangement, consisting of 150,000 training and 1,000 testing tasks for language-conditioned rearrangement. Video examples of LLaRP in unseen Language Rearrangement instructions are at https://llm-rl.github.io
    corecore